Software solution for Entropy Decoding on TM32 cores

نویسنده

  • Arjen Westerterp
چکیده

Delft University of Technology Faculty of Electrical Engineering, Mathematics and Computer Science CE-MS-2003-13 Entropy Decoding is an essentially sequential task. Executing this task on a processor that benefits from Instruction Level Parallelism (ILP), Data Level Parallelism (DLP) or both requires an efficient implementation of Entropy Decoding. Entropy Decoding forms the part of MPEG-2 Decoding that exploits the least parallelism. Creating more parallelism in Entropy Decoding is expected to optimize MPEG-2 Decoding significantly. When writing an efficient MPEG-2 Decoder the result needs to be conform to the Berkeley MPEG-2 Decoder. In this thesis the Entropy Decoding process is optimized by compressing the data-stream from the Lookup Table to the Variable Length Decoder. In addition, the number of branches in the Entropy Decoding process is reduced. This stimulates Instruction Level Parallelism (ILP), that is exploited even more by using preloading data. The result of these three optimization steps is evaluated on the TriMedia32 core. Here a reduction of 30% in the number of cycles needed to decode a coefficient is achieved when going from 23 to 16 instruction cycles. In addition, an exceptionally high VLIW slot occupancy of more than 85% shows that the Entropy Decoding process is highly optimized. This result is integrated in the MPEG-2 Decoder written for SpaceCAKE. The resulting MPEG-2 Decoder has the same properties in terms of DLP compared to the original decoder. Despite the patch needed to conform to the data management in the multi-threaded MPEG-2 decoder, the improvement still is significant. On average an improvement of 12% is measured, with a peak improvement of 20%. These figures represent the lower bound of performance improvement. It is required to rewrite the data management structure of the Berkeley MPEG-2 Decoder to allow for a fully performing optimized Entropy Decoding process in this decoder. In conclusion, the optimized MPEG-2 Decoder gives a significant performance improvement based on remarkable results achieved by increasing the efficiency of the compiled Entropy Decoder source code, thus optimizing the Entropy Decoder process. Software solution for Entropy Decoding on TM32 cores

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entropy Decoding on TriMedia/CPU64

The paper describes a software implementation of an MPEG–compliant Entropy Decoder on a TriMedia/CPU64 processor. We first outline entropy decoding basics and TriMedia/CPU64 architecture. Then, we describe the reference implementation of the entropy decoder, which consists mainly of a software pipelined loop. On each iteration, a set of look-up tables partitioning the VariableLength Codes (VLC)...

متن کامل

Fast, Scalable Phrase-Based SMT Decoding

The utilization of statistical machine translation (SMT) has grown enormously over the last decade, many using open-source software developed by the NLP community. As commercial use has increased, there is need for software that is optimized for commercial requirements, in particular, fast phrase-based decoding and more efficient utilization of modern multicore servers. In this paper we re-exam...

متن کامل

Numerical and Analytical Approach for Film Condensation on Different Forms of Surfaces

This paper tries to achieve a solution for problems that concern condensation around a flat plate, circular and elliptical tube in by numerical and analytical methods. Also, it calculates entropy production rates. At first, a problem was solved with mesh dynamic and rational assumptions; next it was compared with the numerical solution that the result had acceptable errors. An additional suppor...

متن کامل

FR-V Single-Chip Multicore Processor:FR1000

To realize the low power consumption and low-cost equipment needed to decode high definition broadcasts, Fujitsu has developed a single-chip multicore processor FR1000 that integrates four 8-way, Very Long Instruction Word (VLIW) FR-V processor cores. This new multicore processor is fabricated using a 90 nm, nine-metal-layer CMOS process and a 900-pin flip-chip package. The processor cores oper...

متن کامل

A Software-Programmable Multiple- Standard Radio Platform

Future wireless terminals will have to be multiband, multi-standard and able to execute multiple standards concurrently. In this paper we describe a flexible and programmable baseband platform for a large variety of mobile and WLAN standards. For the SDR platform architecture our primary design goal was to find the most flexible and easy-to-program solution within a specified power budget. The ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003